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Abstract —  The  availability  of  large,  inexpen¬ 
sive  memory  has  made  it  possible  to  realize  nu¬ 
merical  functions,  such  as  the  reciprocal,  square 
root,  and  trigonometric  functions,  using  a  look¬ 
up  table.  This  is  much  faster  than  by  software. 
However,  a  naive  look-up  method  requires  unrea¬ 
sonably  large  memory.  In  this  paper,  we  show  the 
use  of  a  look-up  table  (LUT)  cascade  to  realize  a 
piecewise  linear  approximation  to  the  given  func¬ 
tion.  Our  approach  yields  memory  of  reasonable 
size  and  significant  accuracy. 

1  Introduction 

Iterative  algorithms  have  often  been  used  to  com¬ 
pute  trigonometric  functions  like  sin{ x)  and  cos(x). 
Such  algorithms  are  appropriate  for  hand  calcula¬ 
tors  [8],  where  the  input  time  by  a  human  is  much 
greater  than  the  computation  time.  For  example,  the 
CORDIC  (Coordinate  Rotation  Digital  Computer) 
[1,  15]  algorithm  achieves  accuracy  with  relatively 
little  hardware  by  iteratively  computing  successively 
more  accurate  bits  using  a  shift  and  add  technique. 
This  is  slow  compared  to  table  lookup,  in  which  an  ar¬ 
gument  x  is  encoded  as  an  ?r-bit  number  that  is  used 
as  an  address  for  f(x)  in  memory.  The  computation 
time,  in  this  case,  is  small  and  equal  to  one  memory 
access.  However,  a  naive  table  lookup  can  involve 
huge  tables.  For  example,  if  x  is  represented  as  a  16 
bit  word  and  the  results  are  realized  by  an  8-bit  word, 
there  are  8  x  216  =  219  bits  total,  a  large  number.  In 


addition,  there  is  much  redundancy  of  stored  values, 
as  higher  order  bits  of  the  stored  values  are  the  same 
for  nearby  addresses.  This  has  motivated  the  search 
for  methods  to  achieve  the  high  speed  of  table  lookup 
with  memories  of  reasonable  size. 

Hassler  and  Takagi  [4]  studied  the  problem  of  re¬ 
ducing  the  large  size  of  a  single  lookup  table  by  using 
two  or  more  smaller  lookup  tables.  Their  approach 
applies  to  functions  that  can  be  represented  as  a  con¬ 
verging  series  and  uses  the  Partial  Product  Array 
(PPA),  formed  by  multiplying  together  the  various 
bits  of  the  input  variable  x. 

Stine  and  Schulte  [13,  14]  propose  a  technique  that 
is  based  on  the  Taylor  series  expansion  of  a  differ¬ 
entiable  function.  The  first  two  terms  of  the  expan¬ 
sion  are  realized  and  added  using  smaller  lookup  ta¬ 
bles  than  needed  in  the  naive  method.  Schulte  and 
Swartzlander  [12]  consider  algorithms  for  a  family  of 
variable  precision  arithmetic  function  generators  that 
produce  an  upper  and  lower  bound  on  the  result,  in 
effect  carrying  along  the  range  over  which  the  func¬ 
tion  is  accurate.  These  algorithms  have  been  simu¬ 
lated  in  behaviorial  level  VHDL. 

Lee,  Luk,  Villasenor,  and  Cheung  [6,  7]  have  pro¬ 
posed  a  non-uniform  segmentation  method  for  use  in 
computing  trigonometric  and  logarithmic  functions 
by  table  lookup.  Their  algorithm  places  closely- 
spaced  points  in  regions  where  the  change  in  function 
value  is  greatest.  However,  they  used  an  ad  hoc  cir¬ 
cuit  to  generate  the  non-uniform  segmentation,  and 
the  segments  were  not  optimized  to  the  given  func¬ 
tion. 
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Rather  than  an  ad  hoc  choice  for  this  circuit,  we 
propose  a  circuit,  called  a  segment  index  encoder, 
that  is  specifically  designed  for  the  function.  Toward 
this  end,  we  propose  an  algorithm  that  derives  a  near- 
optimal  segmentation  intended  to  minimize  the  ap¬ 
proximation  error.  Then,  we  show  how  to  design  a 
LUT  cascade  [5,  9,  10,  11]  to  implement  the  segment 
index  encoder.  The  advantage  of  our  approach  is  that 
approximations  are  more  accurate  over  a  wider  class 
of  functions. 

To  illustrate  this,  we  analyze  a  wider  class  of  func¬ 
tions,  extending  to  sigmoid  and  entropy  functions. 
Our  approach  can  be  applied  to  elementary  func¬ 
tions  (including  trigonometric  functions,  transcen¬ 
dental  functions,  and  the  power  function),  and  to 
non-elementary  functions  (including  the  normal  dis¬ 
tribution  and  elliptic  integral  function).  We  do  not 
require  a  converging  series  for  the  realized  function, 
as  in  [4].  Further,  we  do  not  require  that  the  func¬ 
tion  be  differentiable,  as  in  [13,  14];  rather,  it  can  be 
applied  to  functions  that  are  piecewise  differentiable, 
such  as  the  sawtooth  function. 


2  The  Problem 

We  could  represent  f{x)  in  a  single  memory,  where 
x  is  applied  as  an  address,  and  the  memory  con¬ 
tents  represents  a  binary  value  for  f(x).  Instead,  we 
choose  to  find  a  piecewise  linear  approximation  to 
f(x),  where  each  segment  is  represented  as  c\X  +  co¬ 
in  this  case,  we  require  a  segment  index  encoder  that 
converts  the  16  bit  representation  of  x  into  an  q  bit 
code  that  is  the  segment  index.  This  is  then  applied 
to  a  much  smaller  memory  that  produces  binary  num¬ 
bers  for  ci  and  co-  This  scheme  requires  a  multiplier 
that  computes  c\X  and  an  adder  that  adds  co  to  C\X 
to  form  f(x). 

There  are  two  important  parts  to  this.  First,  we 
need  a  segmentation  of  f(x)  that  minimizes  the  error 
caused  by  representing  a  general  function  as  a  lin¬ 
ear  function  c\x  +  cq.  Second,  we  need  a  compact 
realization  of  the  segment  index  encoder. 

Consider  the  segmentation  problem.  Fig.  1  shows 
MATLAB’s  ’humps’  function 


f(x)  vs.  x  segmentation.  No.  of  segments  =  32. 


Figure  1:  Segmentation  of  MATLAB’s  humps  func¬ 
tion. 


(x  -  0.3)2  +  0.3  +  (x  -  0.9)2  +  4 


and  a  piecewise  approximation  for  it. 

There  are  32  segments.  The  maximum  absolute  er¬ 
ror  over  all  segments  is  small,  0.26557,  and  the  error 
within  each  segment  is  approximately  uniform.  That 
is,  each  segment  produces  an  error  close  to  0.26557. 
Notice  that  small  widths  are  needed  around  the  left 
hump  and  to  a  lesser  extent  around  the  smaller  right 
hump.  These  small  segments  produce  nearly  the 
same  error  as  the  large  segments  in  the  approximately 
linear  portion  of  the  curve  on  the  right.  The  problem 
of  generating  near-optimum  segmentations  of  func¬ 
tions  is  discussed  in  Section  4. 

The  second  problem  of  designing  the  segment  in¬ 
dex  encoder  is  complicated  by  the  fact  that  different 
functions  require  different  segmentations.  In  an  im¬ 
plementation  of  a  segmentation  of  the  function  of  Fig. 
1,  the  encoder  converts  a  16-bit  input  (value  of  x)  into 
a  5-bit  output  (segment  number),  and  is  potentially  a 
large  circuit.  In  the  next  section,  we  present  a  design 
method  for  the  encoder  using  the  LUT  cascade,  such 
that  the  resulting  circuit  is  small. 
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3  Architecture  for  Numerical 
Function  Generator 

3.1  Overview 

Table  1  shows  the  notation  used  in  this  paper.  The 
first  row  shows  the  real-valued  single-variable  func¬ 
tion  f(x)  that  our  circuit  approximates,  where  x  is 
the  independent  variable.  The  second  row  shows  the 
fixed-point  numerical  representation  of  x  and  f(x). 
To  illustrate  our  approach,  we  have  chosen  to  rep¬ 
resent  x  in  16  bits  and  f{x)  in  8  bits.  That  is,  we 
use  fix)  and  x  to  denote  a  real- valued  function  and 
its  independent  variable,  as  well  as  their  fixed-point 
representations.  Context  will  determine  which  mean¬ 
ing  is  intended.  We  use  X  and  F(X )  to  denote  the 
ordered  set  of  logic  variables  and  logic  functions  repre¬ 
senting  fixed-point  numbers  x  and  f(x),  respectively. 
That  is,  F(X)  is  a  multiple-output  function  on  X.  It 
is  the  logic  function  our  proposed  circuit  implements. 


Table  1:  Notation 


Type 

Ind.  Func- 

Var.  tion 

Examples 

# 

Bits 

real¬ 

valued 

X 

/(*) 

X  =  tt/4  =  0.785398 
cos(x)  =  0.707107 

fixed- 

point 

X 

fix) 

.1100100100001111 

.10110101 

16 

8 

logic 

X 

F(X) 

1100100100001111 

10110101 

16 

8 

/(*) 

Figure  2:  Architecture  For  the  Numerical  Function 
Generator. 

3.2  Segment  Index  Encoder 

The  segment  index  encoder  realizes  the  segment  index 
function  g(x)  :  [0, 1  — 216]  — >  {0, 1,  2, . . .  ,p—  1}  shown 
in  Table  2.  It  assumes  0  <  x  <  1.0.  Suppose  that  x  is 
represented  in  16  bits,  and  we  want  to  approximate 
/( x)  using  p  segments.  The  segment  index  encoder, 
therefore,  has  16  inputs  and  q  —  |~log2p]  outputs. 
The  success  of  this  approach  depends  on  finding  a 
compact  circuit  for  the  segment  index  encoder. 


Fig.  2  shows  the  architecture  used  to  implement 
the  function.  The  independent  variable  x  labels  the 
16  binary  inputs  that  drive  the  Segment  Index  En¬ 
coder.  The  Encoder,  in  turn,  produces  the  segment 
number  in  which  this  value  of  x  is  located.  The  seg¬ 
ment  number  is  applied  to  the  Coefficients  Table, 
which  produces  the  slope  C\  and  the  intercept  Cq  for 
the  linear  approximation  c\X  +  cq  to  f(x)  in  this  in¬ 
terval.  A  multiplier  is  needed  to  compute  c\X  and 
an  adder  is  needed  to  compute  the  sum  in  c\X  +  cq. 
The  logic  variables  from  the  adder,  labelled  by  f(x), 
form  the  approximation  to  the  function.  /( x)  is  rep¬ 
resented  by  8  bits. 


Table  2:  Segmentation  Index  Function 


Input  Range 

Segment  4) 

0  <  x  <  so 

0 

So  <  X  <  Si 

1 

Si  <  X  <  s2 

2 

Sp-i  <  x  <  1  —  2~16 

P~  1 

We  propose  the  use  of  a  LUT  cascade 
[5,  9,  11]  to  realize  the  segment  index  encoder, 
as  shown  in  Fig.  3.  This  maps  X  to  S,  where 
S=  (sq-i,  Sq_2,  ■  ■  ■  j  So)  represents  the  segment 
number  sq-  12q~1  +  sq-2^q~2  +  . . .  +  sq2°.  The  LUT 
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X 


Segment  Number 


Figure  3:  LUT  Cascade  Realization  of  the  Segment 
Index  Encoder. 

cascade  realizes  the  segment  index  function  shown 
in  Table  2.  This  function  is  monotone  increasing. 
That  is,  as  we  scan  x  in  ascending  order  of  values, 
the  segment  number  never  decreases.  This  property 
results  in  a  LUT  cascade  with  reasonable  size,  as  we 
show  in  Lemma  1.  We  measure  size  by  the  number 
of  bits  of  memory  needed  to  store  the  cell’s  function 
over  all  cells  in  the  LUT  cascade.  The  size,  in  turn,  is 
dependent  of  the  number  of  rails  or  interconnecting 
lines  between  cells.  This  number  can  be  determined 
from  the  decomposition  chart  of  the  function.  This 
chart  partitions  the  variables  into  two  subsets.  One 
subset  corresponds  to  the  variables  on  the  input  side 
of  a  set  R  of  rails  and  the  other  subset  corresponds 
to  the  variables  on  the  output  side  of  R. 

Lemma  1:  Let  (Xhigh,-Xdow)  be  an  ordered  par¬ 
tition  of  X  into  two  parts,  where  Xhigh  = 
(xn-i,xn-2,  ■  ■  ■  ,xn~k)  represents  the  most  sig¬ 
nificant  k  bits  of  x  (a^high),  and  Xiow  = 
(xn-k-i,  xn-k-2,  •  •  • ,  £o)  represents  the  least  signifi¬ 
cant  n  —  k  bits  of  x  (xiow)-  Consider  the  decomposi¬ 
tion  chart  of  g(X)  (representing  a  monotone  increas¬ 
ing  numerical  function  g(x)),  where  values  ofXiow  la¬ 
bel  the  columns,  values  of  Whigh  label  the  rows,  and 
entries  are  values  of  the  p- valued  segmentation  func¬ 
tion,  s.  Its  column  multiplicity  is  at  most  p. 

(Proof)  Assume,  without  loss  of  generality,  that  both 


the  columns  and  rows  are  labelled  in  ascending  or¬ 
der  of  the  value  of  x\ow  and  Xhigh,  respectively.  Be¬ 
cause  g{x)  is  a  monotone  increasing  function,  in  scan¬ 
ning  left-to-right  and  then  top-to-bottom,  the  values 
of  g{x)  will  never  decrease.  An  increase  causes  two 
columns  to  be  distinct.  Conversely,  if  no  increase  oc¬ 
curs  anywhere  across  two  adjacent  columns,  they  are 
identical. 

In  a  monotone  increasing  p- valued  output  function, 
there  are  p—  1  dividing  lines  among  2"  output  values. 
Dividing  lines  among  values  divide  columns  in  the 
decomposition  chart.  Thus,  there  can  be  at  most  p 
distinct  columns. 

I 

The  significance  of  Lemma  1  is  that  a  column 
multiplicity  of  p  implies  that  there  are  at  most 
|~log2p]  lines  between  the  block  associated  with 
Xiow  and  Xhigh-  A  low  value,  as  suggested  in 
Lemma  1,  implies  the  individual  cells  have  a  small 
number  of  rails  (interconnecting  lines).  As  a  result, 
the  individual  cells  are  reasonably  simple.  A  formal 
statement  of  this  is 

Theorem  1:  If  the  segment  index  function  g(x) 
maps  to  at  most  p  segments,  then  there  exists  a  LUT 
cascade  realizing  g{x ),  where  the  number  of  rails  is 
at  most  |~log2  p]  ■ 

As  shown  in  Fig.  3,  the  outputs  of  each  cell  in  the 
LUT  cascade  are  partitioned  into  two  parts,  those 
that  drive  the  next  cell  and  those  that  are  part  of 
the  segment  number.  For  some  cells,  there  may  be 
no  outputs  that  are  part  of  the  segment  number.  In¬ 
deed,  our  experience  is  that  leftmost  cells  tend  not  to 
produce  segment  number  bits,  and  most  such  outputs 
come  only  the  right  cells.  In  the  example  we  describe 
in  Section  5,  all  segment  number  bits  come  from  the 
single  rightmost  cell. 

4  Segmentation  Algorithm 

Our  approach  to  segmentation  is  based  on  the 
Douglas-Peucker  [3]  polyline  simplification  algo¬ 
rithm.  This  algorithm  finds  a  piecewise  linear  ap¬ 
proximation  to  a  function  f(x)  recursively.  First,  it 
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approximates  f{x)  as  a  single  straight-line  segment 
connecting  the  end  points.  Then,  it  finds  a  point  P  on 
the  curve  for  f(x)  that  is  farthest  from  the  straight- 
line  segment  on  a  line  perpendicular  to  the  segment. 
It  then  creates  two  straight-line  segments  joined  at 
P  connecting  to  the  end  points.  It  proceeds  in  this 
manner,  stopping  when  the  maximum  distance  from 
the  straight-line  segment  is  below  a  given  threshold. 
The  Douglas-Peucker  algorithm  is  used  in  rendering 
curves  for  graphics  displays.  For  our  purposes,  how¬ 
ever,  we  seek  a  piecewise  linear  approximation  that 
minimizes  the  approximation  error.  That  is,  if  fp(x) 
is  the  piecewise  linear  approximation  to  f{x ),  where 
p  is  the  number  of  segments,  we  seek  to  minimize 
\f(x)  —  fp(x)  |.  Thus,  we  have  modified  the  Douglas- 
Peucker  algorithm  by  replacing  the  perpendicular  dis¬ 
tance  criteria  with  a  minimum  error  criteria. 

We  have  applied  the  modified  Douglas-Peucker  al¬ 
gorithm  to  the  functions  in  Table  4.  This  shows 
common  numeric  functions,  including  transcendental 
functions,  the  entropy  function,  the  sigmoid  function, 
and  the  Gaussian  function.  The  interval  of  x  values 
is  shown  using  the  [a,  b)  notation,  where  a  <  b.  Here, 
[a  means  the  interval  includes  the  smallest  value  a, 
and  b)  means  the  interval  excludes  the  largest  value  b. 
In  the  binary  number  representation  of  x,  we  enforce 
b)  by  restricting  the  largest  value  of  x  to  be  b  —  2“ , 
where  a  is  the  contribution  of  the  least  significant 
bit. 

5  Example  Design 

In  this  section,  we  discuss  in  detail  the  design  of  the 
function  generator  for  one  function,  cos(x).  Then,  we 
summarize  key  features  of  the  designs  for  all  functions 
implemented. 

For  the  cos(x)  function,  the  input  X  has  16  vari¬ 
ables  and  represents  i  to  a  precision  of  2“ 16  ~ 
1.5  x  10-5.  Using  the  Douglas-Peucker  algorithm, 
we  determined  a  9-element  segmentation  as  shown  in 
Table  3 

We  sketch  briefly  the  BDD  design  process  de¬ 
scribed  in  [9].  Fig.  4  shows  the  BDD  as  a  trian¬ 
gle.  For  each  variable,  there  is  an  associated  width 
that  is  shown  next  to  the  BDD.  In  this  BDD,  let  y± 


Figure  4:  BDD  for  the  Segment  Index  Encoder  for 
the  cos(x)  Function. 

be  the  top  variable,  y2  the  next  variable,  etc..  The 
width  of  a  BDD  at  level  k  is  the  number  of  edges 
from  variables  labelled  yk  down  variables  lower  in  the 
BDD,  where  edges  incident  to  the  same  lower  variable 
counted  as  1.  An  order  top-to-bottom  that  produced 
small  widths  is  xq,  X\,  ...  213,  S3,  S2,  Si,  and  So-  Note 
that  only  the  end  points  of  Table  3  need  be  used  and, 
for  these  points,  the  two  most  significant  bits  are  al¬ 
ways  0.  Therefore,  only  14  bits  (20,  x\,  ...  and  213) 
of  X  are  used. 

Note  that  the  width  never  exceeds  9.  Thus,  from 
Theorem  3.2,  any  partition  yields  a  LUT  cascade  with 
at  most  4  rails.  The  third  column  of  Fig.  4  shows  a 
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Table  3:  Segmentation  for  the  cos(;r)  Function. 


Segment  Begin  Point 

Segment  End  Point 

Segment 

in  Decimal 

in  Binary 

in  Decimal 

in  Binary 

Number 

0.000000 

0.000  0000  0000  0000 

0.053314 

0.000  0110  1101  0011 

0000 

0.053345 

0.000  0110  1101  0100 

0.107300 

0.000  1101  1011  1100 

0001 

0.107330 

0.000  1101  1011  1101 

0.162994 

0.001  0100  1101  1101 

0010 

0.163025 

0.001  0100  1101  1110 

0.219696 

0.001  1100  0001  1111 

0011 

0.219727 

0.001  1100  0010  0000 

0.277740 

0.010  0011  1000  1101 

0100 

0.277771 

0.010  0011  1000  1110 

0.307800 

0.010  0111  0110  0110 

0101 

0.307831 

0.010  0111  0110  0111 

0.339386 

0.010  1011  0111  0001 

0110 

0.339417 

0.010  1011  0111  0010 

0.406799 

0.011  0100  0001  0010 

0111 

0.406830 

0.011  0100  0001  0011 

0.500000 

0.100  0000  0000  0000 

1000 

repeated  partitioning  that  yields  four  instances  of  a 
set  of  4  rails.  These  separate  5  cells  in  the  cascade 
that  are  used  to  realize  the  given  function.  Each  has 
6  inputs  and  4  outputs.  The  resulting  circuit  is  shown 
in  Fig.  5. 


5.1  Memory  Size  Needed  For  the 
cos(nx)  Function 

We  can  compare  this  realization  with  the  naive 
method  on  the  basis  of  the  number  of  bits  of  mem¬ 
ory  required.  That  is,  with  the  naive  method,  there 
is  a  large  memory  with  216  locations,  each  produc- 
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Figure  5:  Segment  Index  Encoder  for  the  cos(ir) 
Function  Realized  by  an  LUT  Cascade. 


ing  8  bits,  for  a  total  of  219  =  524,  588  bits.  With 
the  LUT  cascade  realization,  there  are  5  cells  each 
with  a  memory  of  4  x  26  =  256  bits  for  a  total  of 
5  x  256  =  1280  bits.  The  coefficients  memory  has  a 
4  bit  address  and  stores  9  segment  coefficient  pairs. 
The  two  coefficients,  ci  and  Co,  are  each  represented 
in  10  bits,  for  a  total  of  9  x  (10  +  10)  =  180  bits. 
Totalling  the  LUT  cascade  and  coefficients  memory 
yields  1280  +  180  =  1460  bits.  It  should  be  noted 
that  the  approximation  method  we  propose  requires 
a  LUT  cascade,  a  multiplier  and  an  adder  that  is 
not  present  in  the  naive  realization,  and  this  con¬ 
tributes  delay.  However,  a  larger  memory  is  likely  to 
be  slower  than  the  much  smaller  memory  required  in 
the  approximation  approach.  The  memory  reduction 
is  significant,  slightly  more  than  1/300  of  the  memory 
size  for  the  naive  method! 

5.2  Summary  of  Memory  Require¬ 
ments  For  Numerical  Functions 

Table  4  summarizes  the  results  of  the  design  process 
just  described.  This  shows  that  the  sizes  across  the 
various  functions  are  small.  They  range  from  less 
than  100  bits  to  approximately  4000  bits.  In  forming 
the  LUT  cascades,  we  did  not  minimize  the  mem¬ 
ory.  For  some  functions,  the  minimum  memory  cor¬ 
responded  to  a  cascade  with  9  cells. 

In  Table  4,  the  functions  listed  all  have  an  input 
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value  for  x  of  16  bits.  However,  over  the  range  of 
values  specified  in  Table  4,  the  most  significant  bit 
is  constant,  and,  in  the  case  of  the  tan( nx)  function, 
the  most  significant  two  bits  are  constant.  Thus,  in 
comparing  with  the  naive  method,  one  must  consider 
a  memory  of  size  215  or  214,  as  appropriate. 

There  is  some  correlation  between  the  total  mem¬ 
ory  size,  as  shown  in  the  rightmost  column,  and  the 
number  of  segments,  as  shown  in  the  fourth  column 
from  the  left.  Enlarging  the  domain  increases  the 
number  of  segments  and  thus  the  memory  size.  Be¬ 
sides  the  domain  size,  the  memory  size  is  dependent 
on  the  function  realized.  For  example,  the  Gaussian 
distribution  (last  line  of  Table  4)  has  surprisingly  low 
memory  requirements. 


6  Summary  and  Conclusions 

We  have  shown  a  design  method  for  circuits  that 
computes  elementary  and  non-elementary  functions 
quickly  and  accurately.  It  is  based  on  the  piecewise 
linear  approximation  of  the  function.  The  effective¬ 
ness  of  this  approach  lies  in  two  contributions:  1.  an 
approximation  algorithm  of  high  accuracy  and  2.  the 
use  of  a  LUT  cascade  in  a  compact  realization  of  the 
segment  index  encoder.  The  latter  converts  a  binary 
representation  of  x  into  a  binary  representation  of  the 
segment  number.  Each  segment  number  is  an  address 
to  a  reasonably  small  memory,  which  provides  the  co¬ 
efficients  of  the  corresponding  segment. 

The  previous  approach  [6,  7]  used  an  ad  hoc  cir¬ 
cuit  to  generate  segmentation.  So,  such  a  method  is 
only  useful  for  a  limited  class  of  functions.  However, 
our  approach  uses  an  LUT  cascade,  a  universal  cir¬ 
cuit  that  generates  optimized  segmentation  for  wider 
classes  of  functions. 

Extensions  of  this  work  include  the  use  of:  1. 
a  scaling  factor  (shifter)  for  functions  with  a  large 
dynamic  range,  2.  a  higher-order  approximations 
()Tb  o  CiX l)  to  reduce  the  approximation  error  in  the 
segment,  and  3.  improved  segmentation  algorithm. 
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Table  4:  Comparison  of  Sizes  of  Memory  For  Various  Functions,  Where  x  and  f(x)  are  Realized  in  16  and 
8  Bits. 


Function 

Interval 

# 

#  Bits 

#  Cell  Inputs 

LUT 

Coef 

Total 

f(x) 

X 

fix) 

Seg 

Cl 

co 

#  Cell  Outputs 

Mem 

Mem 

Mem 

2x 

[0,1] 

[1,2] 

7 

10 

10 

5 

5 

5 

5 

5 

5 

576 

140 

716 

3 

3 

3 

3 

3 
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1/x 

[1,2) 

(1,1] 
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10 

10 
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576 

160 

736 
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3 

3 
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\[x 

IiS>2) 

CXI 
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16 

15 

7 
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7 
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2900 

558 

3458 
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5 
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(^M 
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10 

10 
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5 

5 
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268 

80 

348 

2 

2 

2 

2 

2 

log20) 

[1,2) 

[0, 1) 

5 

10 

10 

5 

5 

5 

5 

5 

5 
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100 
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3 

3 

3 

3 

3 
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3 

3 
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[0, 1] 
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5 

5 
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It. 

8 

10 

10 

5 

5 

5 

5 

5 
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3 

3 

3 
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1 
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